Skip to content

Add 3D support to miss_forest_impute#1052

Open
sueoglu wants to merge 5 commits intomainfrom
longitudinal/issue-948
Open

Add 3D support to miss_forest_impute#1052
sueoglu wants to merge 5 commits intomainfrom
longitudinal/issue-948

Conversation

@sueoglu
Copy link
Copy Markdown
Collaborator

@sueoglu sueoglu commented Apr 16, 2026

fixes #948
Extends miss_forest_impute to handle 3D EHRData inputs

  • Removed @function_2D_only() decorator
  • If the input is 3D: flattens the layer to (n_obs * n_t, n_vars) (flattening along axis 0) before imputation and reshapes back to (n_obs, n_vars, n_t) afterwards
  • Added guard raising ValueError when input is 3D but no layer is specified
  • Added tests for 3D imputation

@sueoglu sueoglu marked this pull request as ready for review April 21, 2026 11:46
@sueoglu sueoglu requested a review from eroell April 21, 2026 11:46
Copy link
Copy Markdown
Collaborator

@eroell eroell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent as always!

Next to the comments, a question of interest:
Could you take physionet2012's 3D data, mask ~10% of the values, and impute them with a) missforest b) simple mean impute, and then compare which one is more accurate by comparing the mean of the imputed values with their true mean, and the standard deviation of the imputed values with their true standard deviation?

A code snippet doing so just for the PR comment would be great!

max_iter=max_iter,
random_state=random_state,
)
# not sure if this should be kept?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed indeed, you're defining the Imputer in the single-dispatch.

further, there is in Line 568 an unused definition of RandomForestClassifier which you can also throw out.

"var_names parameter."
)
mtx = edata.X if layer is None else edata.layers[layer]
input_dtype = mtx.dtype if np.issubdtype(mtx.dtype, np.floating) else np.float64
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a quick comment on why this is needed here? :)

miss_forest_impute(edata_blob_small, layer="layer_2")
with pytest.raises(ValueError, match=r"only supports 2D data"):
miss_forest_impute(edata_blob_small, layer=DEFAULT_TEM_LAYER_NAME)
@pytest.mark.parametrize("edata_mini_3D_missing_values", [True], indirect=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add for the "basic" test a parametrization which also checks the array types, where dask raises a valueerror is checked, and it is also checked that this works with sparse (at least in the 2D case then)?



@_miss_forest_impute_function.register(np.ndarray)
@_miss_forest_impute_function.register(sp.csr_array)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is legit to consider sparse arrays for imputations and make them dense.

Could you mention this in the function docstring?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Longitudinal miss_forest_impute

2 participants